Regularized subspace n-gram model for phonotactic ivector extraction
نویسندگان
چکیده
Phonotactic language identification (LID) by means of n-gram statistics and discriminative classifiers is a popular approach for the LID problem. Low-dimensional representation of the n-gram statistics leads to the use of more diverse and efficient machine learning techniques in the LID. Recently, we proposed phototactic iVector as a low-dimensional representation of the n-gram statistics. In this work, an enhanced modeling of the n-gram probabilities along with regularized parameter estimation is proposed. The proposed model consistently improves the LID system performance over all conditions up to 15% relative to the previous state of the art system. The new model also alleviates memory requirement of the iVector extraction and helps to speed up subspace training. Results are presented in terms of Cavg over NIST LRE2009 evaluation set.
منابع مشابه
iVector Approach to Phonotactic Language Recognition
This paper addresses a novel technique for representation and processing of n-gram counts in phonotactic language recognition (LRE): subspace multinomial modelling represents the vectors of n-gram counts by low dimensional vectors of coordinates in total variability subspace, called iVector. Two techniques for iVector scoring are tested: support vector machines (SVM), and logistic regression (L...
متن کاملDimensionality Reduction for Using High-Order n-Grams in SVM-Based Phonotactic Language Recognition
SVM-based phonotactic language recognition is state-of-the-art technology. However, due to computational bounds, phonotactic information is usually limited to low-order phone n-grams (up to n = 3). In a previous work, we proposed a feature selection algorithm, based on n-gram frequencies, which allowed us work successfully with high-order n-grams on the NIST 2007 LRE database. In this work, we ...
متن کاملLanguage Recognition on Albayzin 2010 LRE using PLLR features
Phone Log-Likelihood Ratios (PLLR) have been recently proposed as alternative features to MFCC-SDC for iVector Spoken Language Recognition (SLR). In this paper, PLLR features are first described, and then further evidence of their usefulness for SLR tasks is provided, with a new set of experiments on the Albayzin 2010 LRE dataset, which features wide-band multi speaker TV broadcast speech on si...
متن کاملLanguage Recognition on Albayzin 2010 LRE using PLLR features Reconocimiento de la Lengua en Albayzin 2010 LRE utilizando caracteŕısticas PLLR
Phone Log-Likelihood Ratios (PLLR) have been recently proposed as alternative features to MFCC-SDC for iVector Spoken Language Recognition (SLR). In this paper, PLLR features are first described, and then further evidence of their usefulness for SLR tasks is provided, with a new set of experiments on the Albayzin 2010 LRE dataset, which features wide-band multi speaker TV broadcast speech on si...
متن کاملTper Hcaeser Pidi Implementation of the Standard I-vector System for the Kaldi Speech Recognition Toolkit
This report describes implementation of the standard i-vector-PLDA framework for the Kaldi speech recognition toolkit. The current existing speaker recognition system implementation is based on the Subspace Gaussian Mixture Model (SGMM) technique although it shares many similarities with the standard implementation. In our implementation, we modified the code so that it mimics the standard algo...
متن کامل